Reducing DRAM Latencies with an Integrated Memory Hierarchy Design
نویسندگان
چکیده
In this paper, we address the severe performance gap caused by high processor clock rates and slow DRAM accesses. We show that even with an aggressive, next-generation memory system using four Direct Rambus channels and an integrated one-megabyte level-two cache, a processor still spends over half of its time stalling for L2 misses. Large cache blocks can improve performance, but only when coupled with wide memory channels. DRAM address mappings also affect performance significantly. We evaluate an aggressive prefetch unit integrated with the L2 cache and memory controllers. By issuing prefetches only when the Rambus channels are idle, prioritizing them to maximize DRAM row buffer hits, and giving them low replacement priority, we achieve a 43% speedup across 10 of the 26 SPEC2000 benchmarks, without degrading performance on the others. With eight Rambus channels, these ten benchmarks improve to within 10% of the performance of a perfect L2 cache.
منابع مشابه
Storage-Class Memory Hierarchies for Scale-Out Servers
With emerging storage-class memory (SCM) nearing commercialization, there is evidence that it will deliver the muchanticipated high density and access latencies within only a few factors of DRAM. Nevertheless, the latency-sensitive nature of in-memory services makes seamless integration of SCM in servers questionable. In this paper, we ask the question of how best to introduce SCM for such serv...
متن کاملA Prolegomenon on OLTP Database Systems for Non-Volatile Memory
The design of a database management system’s (DBMS) architecture is predicated on the target storage hierarchy. Traditional diskoriented systems use a two-level hierarchy, with fast volatile memory used for caching, and slower, durable device used for primary storage. As such, these systems use a buffer pool and complex concurrency control schemes to mask disk latencies. Compare this to main me...
متن کاملExploiting the Potential of a Network of IRAMs
Recently, a great deal of research has gone into reducing the gap in performance between processors and their memory systems. Techniques such as prefetching have been developed in order to hide the long latencies involved in retrieving data from oo-chip DRAM. However, applications with irregular access patterns generally see greatly reduced beneet from these techniques, and latencies are becomi...
متن کاملDRAM Aware Last-Level-Cache Policies for Multi-core Systems
x latency DTC in two cycles. In contrast, state-of-the-art DRAM cache always reads the tags from DRAM cache that incurs high tag lookup latencies of up to 41 cycles. In summary, high DRAM cache hit latencies, increased inter-core interference, increased inter-core cache eviction, and the large application footprint of complex applications necessitates efficient policies in order to satisfy the ...
متن کاملMemory Hierarchies in Intelligent Memories : Energy / Performance Design
Dramatic increase in the number of transistors that can be integrated on a chip, coupled with advances in Merged Logic DRAM (MLD) technology fuels the interest in Processor In Memory (PIM) architectures. A promising use of these architectures is as the intelligent memory system of a workstation or server. In such a system, each memory chip includes many simple processors, each of which is assoc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001